in 
o 
o 

(N 
£ 



Deriving a Stationary Dynamic Bayesian Network 
from a Logic Program with Recursive Loops* 

Yi-Dong Shen 

Laboratory of Computer Science, Institute of Software 
Chinese Academy of Sciences, Beijing 100080, China 
Email: ydshen@ios.ac.cn 

Qiang Yang 

Department of Computing Science, Hong Kong University of Science and Technology 

Hong Kong, China 
i— i 

■ Email: qyang@cs.ust.hk 

_t[ Jia-Huai You and Li-Yan Yuan 

Department of Computing Science, University of Alberta 

> . 

, Edmonton, Alberta, Canada T6G 2H1 

ON 

O Email: {you, yuan}@cs.ualberta.ca 

o 
in 
o 

q | Abstract 

Recursive loops in a logic program present a challenging problem to the PLP frame- 
^ ' work. On the one hand, they loop forever so that the PLP backward-chaining inferences 

would never stop. On the other hand, they generate cyclic influences, which are disal- 
lowed in Bayesian networks. Therefore, in existing PLP approaches logic programs with 
recursive loops are considered to be problematic and thus are excluded. In this paper, 
we propose an approach that makes use of recursive loops to build a stationary dynamic 
Bayesian network. Our work stems from an observation that recursive loops in a logic 
program imply a time sequence and thus can be used to model a stationary dynamic 
Bayesian network without using explicit time parameters. We introduce a Bayesian 
knowledge base with logic clauses of the form A <— A\, Ai, true, Context, Types, 
which naturally represents the knowledge that the AjS have direct influences on A 
in the context Context under the type constraints Types. We then use the well- 
founded model of a logic program to define the direct influence relation and apply 
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SLG-resolution to compute the space of random variables together with their parental 
connections. We introduce a novel notion of influence clauses, based on which a declar- 
ative semantics for a Bayesian knowledge base is established and algorithms for building 
a two-slice dynamic Bayesian network from a logic program are developed. 

Key words: Probabilistic logic programming (PLP), the well-founded semantics, 
SLG-resolution, stationary dynamic Bayesian networks. 

1 Introduction 

Probabilistic logic programming (PLP) is a framework that extends the expressive power 
of Bayesian networks with first-order logic j20J The core of the PLP framework is a 
backward-chaining procedure, which generates a Bayesian network graphic structure from 
a logic program in a way quite like query evaluation in logic programming. Therefore, 
existing PLP methods use a slightly adapted SLD- or SLDNF-resolution [TB| as the backward- 
chaining procedure. 

Recursive loops in a logic program are SLD-derivations of the form 



where for any % > 1, Ai is the same as Ai + \ up to variable renaming. 1 Such loops present a 
challenging problem to the PLP framework. On the one hand, they loop forever so that the 
PLP backward-chaining inferences would never stop. On the other hand, they may generate 
cyclic influences, which are disallowed in Bayesian networks. 

Two representative approaches have been proposed to avoid recursive loops. The first 
one is by Ngo and Haddawy [20] and Kersting and De Raedt [T7] , who restrict to considering 
only acyclic logic programs [T]. The second approach, proposed by Glesner and Koller [THj . 
uses explicit time parameters to avoid occurrence of recursive loops. It enforces acyclicity 
using time parameters in the way that every predicate has a time argument such that the 
time argument in the clause head is at least one time step later than the time arguments of 
the predicates in the clause body. In this way, each predicate p(X) is changed to p(X, T) 
and each clause p(X) <— q(X) is rewritten into p(X, Tl) <— T2 = Tl — l,q(X, T2), where 
T, Tl and T2 are time parameters. 

In this paper, we propose a solution to the problem of recursive loops under the PLP 
framework. Our method is not restricted to acyclic logic programs, nor does it rely on 
explicit time parameters. Instead, it makes use of recursive loops to derive a stationary 
dynamic Bayesian network. We will make two novel contributions. First, we introduce the 
well-founded semantics [SB! of logic programs to the PLP framework; in particular, we use 
the well-founded model of a logic program to define the direct influence relation and apply 
1 The left-most computation rule |18| is assumed in this paper. 
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SLG-resolution |B| (or SLTNF-resolution j2H]) to make the backward-chaining inferences. 
As a result, termination of the PLP backward-chaining process is guaranteed. Second, we 
observe that under the PLP framework recursive loops (cyclic influences) define feedbacks, 
thus implying a time sequence. For instance, the clause aids(X) <— aids(Y),contact(X,Y) 
introduces recursive loops 

aids(X) <— aidsiY) ... <— aidsiYl) ... 

Together with some other clauses in a logic program, these recursive loops may generate 
cyclic influences of the form 

aids(pl) <— ... <— aids (pi) ... <— aids(pl) ... 

Such cyclic influences represent feedback connections, i.e., that pi is infected with aids (in 
the current time slice t) depends on whether pi was infected with aids earlier (in the last 
time slice t — 1). Therefore, recursive loops of form (JTJ imply a time sequence of the form 




t t-1 i-2 



where A is a ground instance of A±. It is this observation that leads us to viewing a logic 
program with recursive loops as a special temporal model. Such a temporal model corre- 
sponds to a stationary dynamic Bayesian network and thus can be compactly represented 
as a two-slice dynamic Bayesian network. 

The paper is structured as follows. In Section 2, we review some concepts concerning 
Bayesian networks and logic programs. In Section 3, we introduce a new PLP formalism, 
called Bayesian knowledge bases. A Bayesian knowledge base consists mainly of a logic 
program that defines a direct influence relation over a space of random variables. In Section 
4, we establish a declarative semantics for a Bayesian knowledge base based on a key notion 
of influence clauses. Influence clauses contain only ground atoms from the space of random 
variables and define the same direct influence relation as the original Bayesian knowledge base 
does. In Section 5, we present algorithms for building a two-slice dynamic Bayesian network 
from a Bayesian knowledge base. We describe related work in Section 6 and summarize our 
work in Section 7. 

2 Preliminaries and Notation 

We assume the reader is familiar with basic ideas of Bayesian networks [21] and logic program- 
ming [TS|. I n particular, we assume the reader is familiar with the well-founded semantics 
[3*3*] as well as SLG-resolution [""j. Here we review some basic concepts concerning dynamic 
Bayesian networks (DBNs). DBNs are introduced to model the evolution of the state of the 
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environment over time ^H]. Briefly, a DBN is a Bayesian network whose random variables 
are subscripted with time steps (basic units of time) or time slices (i.e. intervals). In this 
paper, we use time slices. For instance, Weather t -i, Weather t and Weather t +i are random 
variables representing the weather situations in time slices t — 1, t and t + 1, respectively. 
We can then use a DBN to depict how W eather t _\ influences Weather t . 

A DBN is represented by describing the intra-probabilistic relations between random 
variables in each individual time slice t (t > 0) and the inter-probabilistic relations between 
the random variables of each two consecutive time slices t — 1 and t. If both the intra- 
and inter-probabilistic relations are the same for all time slices (in this case, the DBN is a 
repetition of a Bayesian network over time; see Figure the DBN is called a stationary 
DBN [21]; otherwise it is called a flexible DBN [T^]. As far as we know, most existing DBN 
systems reported in the literature are stationary DBNs. 



In a stationary DBN as shown in Figure ^ the state evolution is determined by random 
variables like C, B and A, as they appear periodically and influence one another over time 
(i.e., they produce cycles of direct influences). Such variables are called state variables. Note 
that D is not a state variable. Due to the characteristic of stationarity, a stationary DBN is 
often compactly represented two-slice DBN. 

Definition 2.1 A two-slice DBN for a stationary DBN consists of two consecutive time 
slices, t — 1 and t, which describes (1) the intra-probabilistic relations between the random 
variables in slice t and (2) the inter-probabilistic relations between the random variables in 
slice t — 1 and the random variables in slice t. 

A two-slice DBN models a feedback system, where a cycle of direct influences establishes 
a feedback connection. For convenience, we depict feedback connections with dashed edges. 
Moreover, we refer to nodes coming from slice t — 1 as state input nodes (or state input 
variables) . 2 

Example 2.1 The stationary DBN of Figure Q can be represented by a two-slice DBN as 
shown in Figure where A, C and B form a cycle of direct influences and thus establish 
a feedback connection. This stationary DBN can also be represented by a two-slice DBN 
starting from a different state input node such as Ct-\ or B t -±. These two-slice DBN struc- 
tures are equivalent in the sense that they model the same cycle of direct influences and can 
be unrolled into the same stationary DBN (Figure Q). 
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Figure 1: A stationary DBN structure. 
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Figure 2: A two-slice DBN structure 
(a feedback system). 



Figure 3: A simplified two-slice DBN 
structure. 



Observe that in a two-slice DBN, all random variables except state input nodes have the 
same subscript t. In the sequel, the subscript t is omitted for simplification of the structure. 
For instance, the two-slice DBN of Figure El is simplified to that of Figure El 

In the rest of this section, we introduce some necessary notation for logic programs. 
Variables begin with a capital letter, and predicate, function and constant symbols with 
a lower-case letter. We use p(.) to refer to any predicate/atom whose predicate symbol is 
p and use p(X) to refer to p(X 1 , ...,X n ) where all XjS are variables. There is one special 
predicate, true, which is always logically true. A predicate p(X) is typed if its arguments X 
are typed so that each argument takes on values in a well-defined finite domain. A (general) 
logic program P is a finite set of clauses of the form 



where A, the BiS and CjS are atoms. We use HU(P) and HB(P) to denote the Herbrand 
universe and Herbrand base of P, respectively, and use WF(P) =<It, If> to denote the 
well-founded model of P, where It, If Q HB(P), and every A in I t is true and every A in 
If is false in WF(P). By a (Herbrand) ground instance of a clause/atom G we refer to a 
ground instance of G that is obtained by replacing all variables in G with some terms in 



A logic program P is a positive logic program if no negative literal occurs in the body of 
any clause. P is a Datalog program if no clause in P contains function symbols. P is an acyclic 
logic program if there is a mapping map from the set of ground instances of atoms in P into 
the set of natural numbers such that for any ground instance A <— B\, B^, -i-Bfc+i, ~<B n 
of any clause in P, map(A) > map(Bi) (1 < i < n) p. P is said to have the bounded-term- 
size property w.r.t. a set of predicates {pi(-), ...,pt(-)} if there is a function f(n) such that 
for any 1 < % < t whenever a top goal Go —^Pi(-) has no argument whose term size exceeds 
n, no atoms in any SLDNF- (or SLG-) derivations for Go have an argument whose term size 
exceeds f(n) (this definition is adapted from [32J). 

2 When no confusion would occur, we will refer to nodes and random variables exchangeably. 
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3 Definition of a Bayesian Knowledge Base 



In this section, we introduce a new PLP formalism, called Bayesian knowledge bases. Bayesian 
knowledge bases accommodate recursive loops and define the direct influence relation in 
terms of the well-founded semantics. 

Definition 3.1 A Bayesian knowledge base is a triple <PB U CB,T X ,CR>, where 

• PB U CB is a logic program, each clause in PB being of the form 

P(0 «- Pi(0> ->Pj(0 > trMe > S l> ■■•> ~ ,C 'i ) •■■> "'C'n, 

N v ' V V ' 

direct influences context 

member(Xi, DOMi), member(X s , DOM s ) (4) 
v v ' 

tj/pe constraints 

where (i) the predicate symbols p,pi, ...,Pi only occur in PB and (ii) p(.) is typed so 
that for each variable Xi in it with a finite domain DOMi (a list of constants) there is 
an atom member(Xi, DOMi) in the clause body. 

• T x is a set of conditional probability tables (CPTs) of the form P(p(.)\pi(.), pi(.)), 
each being attached to a clause (j3J) in Pi?. 

• Ci? is a combination rule such as noisy-or, min or max fT7\ 

A Bayesian knowledge base contains a logic program that can be divided into two 
parts, PB and CB. PB defines a direct influence relation, each clause (j3J) saying that the 
atoms pi(-), ...,pi(.) have direct influences onp(.) in the context that B\, B m , ->Ci, ->C n , 
member (Xi, DOMi), member (X s , DO M s ) is true in PB U CB under the well-founded 
semantics. Note that the special literal true is used in clause (J3J) to mark the beginning of 
the context; it is always true in the well-founded model WF{PB U CB). For each variable 
Xi in the head p(.), member (Xi, DOMi) is used to enforce the type constraint on Xi, i.e. the 
value of Xi comes from its domain DOMi. CB assists PB in defining the direct influence 
relation by introducing some auxiliary predicates (such as member(.)) to describe contexts. 3 
Clauses in CB do not describe direct influences. 

Recursive loops are allowed in PB and CB. In particular, when some pi(.) in clause (j3J) 
is the same as the head p(-), a cyclic direct influence occurs. Such a cyclic influence models 
a feedback connection and is interpreted as p(.) at present depending on itself in the past. 

In this paper, we focus on Datalog programs, although the proposed approach applies to 
logic programs with the bounded-term-size property (w.r.t. the set of predicates appearing 
in the heads of clauses in PB) as well. Datalog programs are widely used in database and 
knowledge base systems [SJ an d have a polynomial time complexity in computing their 
3 Thc predicate true can be defined in CB using a unit clause. 
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well-founded models [33]. In the sequel, we assume that except for the predicate member(.), 
PB U CB is a Datalog program. 

For each clause (JH) in PB, there is a unique CPT, P (p(.)\pi(.) , ...,pi(.)), in T x specifying 
the degree of the direct influences. Such a CPT is shared by all instances of clause (jlj). 

A Bayesian knowledge base has the following important property. 

Theorem 3.1 (1) All unit clauses in PB are ground. (2) Let Go =<— p(.) be a goal with p 
being a predicate symbol occurring in the head of a clause in PB. Then all answers of Gq 
derived from PB U CB U {Go} by applying SLG-resolution are ground. 

Proof: (1) If the head of a clause in PB contains variables, there must be atoms of the form 
member (Xi, DO Mi) in its body. This means that clauses whose head contains variables are 
not unit clauses. Therefore, all unit clauses in PB are ground. 

(2) Let A be an answer of Go obtained by applying SLG-resolution to PB U CB U {Go}. 
Then A must be produced by applying a clause in PB of form (J3J with a most general uni- 
fier (mgu) 9 such that A = p(.)9 and the body (pi(.), ...,pi(.), true, Bi, B m , ->Ci, ->C n , 
member (X i, DOM\), member( X s , DOM s ))9 is evaluated true in the well-founded model 
WF(PBUCB). Note that the type constraints (member (X x , DO Mi), member(X s , DOM s ))9 
being evaluated true by SLG-resolution guarantees that all variables XjS in the head p(.) are 
instantiated by 9 into constants in their domains DOM^,. This means that A is ground. □ 

For the sake of simplicity, in the sequel for each clause (jlj) in PB, we omit its type 
constraints member (Xi, DO Mi) (1 < i < s). Therefore, when we say that the context 
B\, B m , -iC\, -iG n is true, we assume that the related type constraints are true as well. 

Example 3.1 We borrow the well-known AIDS program from ^3] (a simplified version) as 
a running example to illustrate our PLP approach. It is formulated by a Bayesian knowledge 
base KB\ with the following logic program: 4 

PB X : 1. aids(pl). 

2. aids(p3). 

3. aids(X) <— aids(X). 

4. aids(X) <— aids (Y), contact (X,Y). 

5. contact(pl,p2). 

6. contact (p2, pi). 

Note that both the 3rd and the 4-th clause produce recursive loops. The 3rd clause also 
has a cyclic direct influence. Conceptually, the two clauses model the fact that the direct 

4 This Bayesian knowledge base KB\ =<PB\ U CB%, T Xl , CR%> may well contain contexts that describe 
a person's background information. The contexts together with CB\, T X1 and CR\ arc omitted here for the 
sake of simplicity. 
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influences on aids(X) come from whether X was infected with aids earlier (the feedback 
connection induced from the 3rd clause) or whether X has contact with someone Y who is 
infected with aids (the 4-th clause). 

4 Declarative Semantics 

In this section, we formally describe the space of random variables and the direct influence 
relation defined by a Bayesian knowledge base KB. We then define probability distributions 
induced by KB. 

4.1 Space of Random Variables and Influence Clauses 

A Bayesian knowledge base KB defines a direct influence relation over a subset of HB(PB). 
Recall that any random variable in a Bayesian network is either an input node (with no 
parent nodes) or a node on which some other nodes (i.e. its parent nodes) in the network 
have direct influences. Since an input node can be viewed as a node whose direct influences 
come from an empty set of parent nodes, we can define a space of random variables from a 
Bayesian knowledge base KB by taking all unit clauses in PB as input nodes and deriving 
the other nodes iteratively based on the direct influence relation defined by PB. Formally, 
we have 

Definition 4.1 The space of random variables of KB, denoted S(KB), is recursively defined 
as follows: 

1. All unit clauses in PB are random variables in S(KB). 

2. Let A <— Ai, Ai, true, B±, B m , —>C\, —>C n be a ground instance of a clause in 
PB. If the context B\, B m , -iCi, ->C„ is true in the well-founded model WF(PBU 
CB) and {A\, A{\ C S(KB), then A is a random variable in S(KB). In this case, 
each Ai is said to have a direct influence on A. 

3. S(KB) contains only those ground atoms satisfying the above two conditions. 

Definition 4.2 For any random variables A, B in S(KB), we say A is influenced by B if B 
has a direct influence on A, or for some C in S(KB) A is influenced by C and C is influenced 
by B. A cyclic influence occurs if A is influenced by itself. 

Example 4.1 (Example 13.11 continued^ The clauses 1, 2, 5 and 6 are unit clauses, thus 
random variables. aids{p2) is then derived applying the 4-th clause. Consequently, S(KB\) = 
{aids(pl),aids(p2),aids(p3),contact(pl,p2),contact(p2,pl)}. aids(pl) and aids(p2) have a 
direct influence on each other. There are three cyclic influences: aids(pi) is influenced by 
itself for each % = 1,2,3. 
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Let WF(PB U CB) =<I t , I f > be the well-founded model of PB U CB and let I PB = 
{p(.) G It\p occurs in the head of some clause in PB}. The following result shows that the 
space of random variables is uniquely determined by the well-founded model. 

Theorem 4.1 S{KB) = I PB . 

Proof: First note that all unit clauses in PB are both in S(KB) and in Ipp- We prove this 
theorem by induction on the maximum depth d > of backward derivations of a random 
variable A. 

(=>) Let A G S(KB). When d = 0, A is a unit clause in PB, so A G Ipb- For 
the induction step, assume B G Ipp for any B G S{KB) whose maximum depth d of 
backward derivations is below k. Let d = k for A. There must be a ground instance 
A <— Ai, Ai, true, Bi, B m , ->Ci, ->C n of a clause in PB such that the AiS are already 
in S(KB) and B x , B m , -^C x , ^C n is true in the well-founded model WF(PB U CB). 
Since the head A is derived from the A^s in the body, the maximum depth for each Ai must 
be below the depth k for the head A. By the induction hypothesis, the AiS are in Ipp- By 
definition of the well-founded model, A is true in WF(PB U CB) and thus A G Ipb- 

(<=) Let A G ip S - When d = 0, A is a unit clause in PB, so A G S(KB). For 
the induction step, assume I? G S(KB) for any i? G Ipb whose maximum depth d of 
backward derivations is below k. Let d = k for A. There must be a ground instance 
A <— A\, A\, true, ... of a clause in PB such that the body is true in WF(PBUCB). Note 
that the predicate symbol of each Ai occurs in the head of a clause in PB. Since the head 
A is derived from the literals in the body, the maximum depth of backward derivations for 
each Ai in the body must be below the depth k for the head A. By the induction hypothesis, 
the AiS are in S(KB). By Definition l4~Tl A G S(KB). □ 

Theorem l4. ll suggests that the space of random variables can be computed by applying an 
existing procedure for the well-founded model such as SLG-resolution or SLTNF-resolution. 
Since SLG-resolution has been implemented as the well-known XSB system j2S], in this 
paper we apply it for the PLP backward-chaining inferences. SLG-resolution is a tabling 
mechanism for top-down computation of the well-founded model. For any atom A, during 
the process of evaluating a goal <— A, SLG-resolution stores all answers of A in a space called 
table, denoted Ta- 

Let {pi, ...,pt} be the set of predicate symbols occurring in the heads of clauses in PB, 
and let GS = P i(x[), -, <- 

Algorithm 1: Computing random variables. 

1. S'(KB) = 0. 

2. For each <— PiQcA in GS 
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(a) Compute the goal <— Pi(Xi) by applying SLG-resolution to PBUC BU{^- pi(Xi)}. 

(b) S'(KB)=S'(KB)UT ^ 

3. Return S' (KB). 

Theorem 4.2 Algorithm 1 terminates, yielding a finite set S' (KB) = S(KB). 

Proof: Let WF(PB U CB) =<I t , I f > be the well-founded model of PB U CB. By the 
soundness and completeness of SLG-resolution, Algorithm 1 will terminate with a finite 
output S'(KB) that consists of all answers of Pi(XA (1 < i < t). By Theorem l3.ll all answers 
in S'(KB) are ground. This means S'(KB) = Ipb- Hence, by Theorem 14.11 S'(KB) = 
S(KB). □ 

We introduce the following principal concept. 

Definition 4.3 Let A <— Ax, Ai, true, B\, B m , ->Ci, ~<C n be a ground instance of 
the k-th clause in PB such that its body is true in the well-founded model WF(PB U CB). 
We call 

k. A <— Ax, ...,Ai (5) 

an influence clause. 5 All influence clauses derived from all clauses in PB constitute the set 
of influence clauses of KB, denoted X c i ause (KB). 

The following result is immediate from Definition 14.11 and Theorem 14.11 

Theorem 4.3 For any influence clause (GJ), A and all AiS are random variables in S(KB). 

Influence clauses have the following principal property. 

Theorem 4.4 For any Ai and A in HB(PB), Ai has a direct influence on A, which is 
derived from the k-th clause in PB, if and only if there is an influence clause in1 c i ause (KB) 
of the form k. A <— A\, Ai, ...,A\. 

Proof: (=>) Assume Ai has a direct influence on A, which is derived from the k-th 
clause in PB. By Definition 14.11 the k-th clause has a ground instance of the form A <— 
Ax, ...,Ai, ...,Ai,true,Bx, ...,B m ,-iCx, ...,-iC n such that B x ,...,B m , -iCx, ->C n is true in 
WF(PBUCB) and {A ly A u A L } C S(KB). By Theorem IP A u A h A l is true 
in WF(PB U CB). Thus, k. A^- Ax, Ai, Ai is an influence clause in X c i ause (KB). 

(<=) Assume that 1 c i ause (KB) contains an influence clause k. A <— Ax,---, Ai,...,Ai. 
Then the k-th clause in PB has a ground instance of the form A <— Ax, Ai, Ai, true, B\, 
...,B m ,^Cx,...,^C n such that its body is true in WF( PB U CB) and (by Theorem 
{Ax, A u ...,Ai} C S(KB). By Definition |4~T1 A e S(KB) and Ai has a direct influence 
on A. □ 

The following result is immediate from Theorem 14.41 
5 Thc prefix "fc." would be omitted sometimes for the sake of simplicity. 
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Corollary 4.5 For any atom A, A is in S(KB) if and only if there is an influence clause 
in l c i ause (KB) whose head is A. 

Theorem 14.41 shows the significance of influence clauses: they define the same direct 
influence relation over the same space of random variables as the original Bayesian knowledge 
base does. Therefore, a Bayesian network can be built directly from I c i ause (KB) provided 
the influence clauses are available. 

Observe that to compute the space of random variables (see Algorithm 1), SLG-resolution 
will construct a proof tree rooted at the goal <— PiQii} for each 1 < % < t [5]. For each an- 
swer A of PiQti) in S(KB) there must be a success branch (i.e. a branch starting at the 
root node and ending at a node marked with success) in the tree that generates the an- 
swer. Let Pi(.) <— Ax, Ai,true, ... be the fc-th clause in PB that is applied to expand the 
root goal <— Pi(Xi) in the success branch and let 9 be the composition of all mgus along 
the branch. Then A = Pi{.)9 and the body A\, Ai, true, ... is evaluated true, with the 
mgu 9, in WF(PB U CB) by SLG-resolution. This means that for each 1 < j < I, Aj9 is 
an answer of Aj that is derived by applying SLG-resolution to PB U CB U A'j} where 
A'- is Aj or some instance of Aj. By Theorem 13.11 all Aj9s are ground atoms. Therefore, 
k. Pi{-)9 <— A\9, A\9 is an influence clause. Hence we have the following result. 

Theorem 4.6 Let B r be a success branch in a proof tree of SLG-resolution, pi(.) <— A\,...,Ai, 
true, ... be the k-th clause in PB that expands the root goal in B r , and 9 be the composition 
of all mgus along B r . B r produces an influence clause k. Pi(.)9 <— A\9, ...,Ai9. 

Every success branch in a proof tree for a goal in GSq produces an influence clause. The 
set of influence clauses can then be obtained by collecting all influence clauses from all such 
proof trees in SLG-resolution. 

Algorithm 2: Computing influence clauses. 

1. For each goal <— Pi(Xi) in GS , compute all answers of Pi(Xi) by applying SLG- 
resolution to PB UCB U pi(Xi)} while for each success branch starting at the root 
goal pi(Xi), collecting an influence clause from the branch into X' clause (K B) . 

2. Return X' dause {KB). 

Theorem 4.7 Algorithm 2 terminates, yielding a finite set X' clause (KB) = X c i ause (K B) . 

Proof: That Algorithm 2 terminates is immediate from Theorem 14 .21 as except for collecting 
influence clauses, Algorithm 2 makes the same derivations as Algorithm 1. The termination 
of Algorithm 2 then implies X' clause (KB) is finite. 

By Theorem 14 .6| any clause in X' dause (KB) is an influence clause in X c i ause (KB). We 
now prove the converse. Let k. A <— Ai, Ai be an influence clause in X c i ause (KB). Then 
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the k-th clause in PB A' <— A[, A\,true, .... has a ground instance of the form A <— 
Ax, Ai, true, ... whose body is true in WF(PB U CB). By the completeness of SLG- 
resolution, there must be a success branch in the proof tree rooted at a goal <— Pi(Xi) in GSq 
where (1) the root goal is expanded by the fc-th clause, (2) the composition of all mgus along 
the branch is 9, and (3) A <— Ai, Ai, true, ... is an instance of (A' <— A[, A\, true, ...)9. 
By Theorem 14.61 k. A'9 A[9, ...,A[9 is an influence clause. Since any influence clause is 
ground, k. A'9 <— A[9, ...,A[9 is the same as k. A <— A\, ...,Ai. This influence clause from 
the success branch will be collected into X' clause (KB) by Algorithm 2. Thus, any clause in 
X dause (KB) ismX' clause (KB). □ 

Example 4.2 (Example 14.11 continued) There are two predicate symbols, aids and contact, 
in the heads of clauses in PB\. Let GSq = aids(X), <— contactiY, Z)}. Algorithm 2 will 
generate two proof trees rooted at <— aids(X) and <— contactiY, Z), respectively, as shown 
in Figures |U and In the proof trees, a label Ci on an edge indicates that the i-th clause 
in PB is applied, and the other labels like X = pi on an edge show that an answer from 
a table is applied. Each success branch yields an influence clause. For instance, expanding 
the root goal <— aids(X) by the 3rd clause produces a child node <— aids(X) (Figure EJ). 
Then applying the answers of aids(X) from the table T aids ( X ) to the goal of this node leads 
to three success branches. Applying the mgu 9 on each success branch to the 3rd clause 
yields three influence clauses of the form 3. aids (pi) <— aids(pi) (i = 1, 2, 3). As a result, we 
obtain the following set of influence clauses: 

Iciause(KBi) : 1. aids (pi). 

2. aids(p3). 

3. aids(pl) ±— aids(pl). 
3. aids(p2) <— aids(p2). 

3. aids(p3) <— aids(p3). 

4. aids(p2) <— aids (pi), contact (p2, pi). 

4. aids (pi) <— aids (p2), contact (pi, p2). 

5. contact(pl,p2). 

6. contact (p2, pi). 

For the computational complexity, we observe that the cost of Algorithm 2 is dominated 
by applying SLG-resolution to evaluate the goals in GSq. It has been shown that for a Datalog 
program P, the time complexity of computing the well-founded model WF(P) is polynomial 
More precisely, the time complexity of SLG-resolution is 0(|P| * A^ np+1 * logN), 
where \P\ is the number of clauses in P, lip is the maximum number of literals in the body 
of a clause, and N , the number of atoms of predicates in P that are not variants of each 
other, is a polynomial in the number of ground unit clauses in P 

PB U CB is a Datalog program except for the member (Xi, DO Mi) predicates (see 
Definition 13.1)1 . Since each domain DO Mi is a finite list of constants, checking if Xi is 
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Figure 4: The proof tree for <— aids(X). 
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Figure 5: The proof tree for <— contact(Y, Z). 

in DOMi takes time linear in the size of DOMi. Let K\ be the maximum number of 
member (Xi, DOMi) predicates used in a clause in P and K 2 be the maximum size of a do- 
main DOMi. Then the time of handling all member (Xi, DOMi) predicates in a clause 
is bounded by K\ * K 2 . Since each clause in P is applied at most N times in SLG- 
resolution, the time of handling all member(Xi, DOMi)s in all clauses in P is bounded by 
\P\ *N*Ki*K2- This is also a polynomial, hence SLG-resolution computes the well-founded 
model WF(PB U CB) in polynomial time. Therefore, we have the following result. 

Theorem 4.8 The time complexity of Algorithm 2 is polynomial. 
4.2 Probability Distributions Induced by KB 

For any random variable A, we use pa(A) to denote the set of random variables that have 
direct influences on A; namely pa (A) consists of random variables in the body of all influence 
clauses whose head is A. Assume that the probability distribution P(A\pa(A)) is available 
(see Section I5~^|) . Furthermore, we make the following independence assumption. 

Assumption 1 For any random variable A, we assume that given pa(A), A is probabilisti- 
cally independent of all random variables in S(KB) that are not influenced by A. 

We define probability distributions induced by KB in terms of whether there are cyclic 
influences. 

Definition 4.4 When no cyclic influence occurs, the probability distribution induced by 
KB is P(S(KB)). 
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Theorem 4.9 P(S(KB)) = Yl A . &s ^ KB )P{Ai\pa(Ai)) under the independence assumpt 



ion. 



Proof: When no cyclic influence occurs, the random variables in S(KB) can be arranged in 
a partial order such that if Ai is influenced by Aj then j > i. By the independence assump- 
tion, we have P(S(KB)) = P(A AieS{KB) A) = P(A 1 | A, =2 A)*P(A l=2 A) = P(A 1 \pa(A 1 ))* 
P(M A l= 3^)* P (A, =3 ^) = - = U A eS(KB)P(Mpa(A)) □ 

When there are cyclic influences, we cannot have a partial order on S(KB). By Defi- 
nition 14.21 and Theorem 14.41 any cyclic influence, say " A\ is influenced by itself," must be 
resulted from a set of influence clauses in X c i ause (KB) of the form 

A x <- ...,A 2 ,... 
A 2 <- ...,A 3 ,... 

(6) 

A n < A±, ... 

These influence clauses generate a chain (cycle) of direct influences 

A 1 <- A 2 <- A 3 <- ... <- A n <- Aj (7) 

which defines a feedback connection. Since a feedback system can be modeled by a two- 
slice DBN (see Section EJ), the above influence clauses represent the same knowledge as the 
following ones do: 

A 1 <- ...,A 2 ,... 

A 2 <- ..,4... 



A n <— Ai t _ 1: ... 

Here the AiS are state variables and Ai t _ x is a state input variable. As a result, Ai being 
influenced by itself becomes A\ being influenced by A\ t _ x . By applying this transformation 
(from influence clauses (JOJ) to (JHJ)), we can get rid of all cyclic influences and obtain a 
generalized set X c i ause (KB) g of influence clauses from X c i ause (K B) . 6 

Example 4.3 (Example 14.21 continued) I c i ause (KBi) can be transformed to the follow- 
ing generalized set of influence clauses by introducing three state input variables aids{p\) t -\, 
aids(p2) t -i and aids(p3) t -i- 

6 Dcpcnding on starting from which influence clause to generate an influence cycle, a different generalized 
set containing different state input variables would be obtained. All of them are equivalent in the sense that 
they define the same feedbacks (cycles of direct influences) and can be unrolled into the same stationary 
DBN. 
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T dause {KBx) g : 1. aids{pl). 

2. aids(p3). 

3. aids(pl) <— azds(pl) t _i. 
3. aids{p2) <— aids{p2) t -\. 

3. aids(p3) <— aids{p3) t -i- 

4. aids{p2) <— aids(pl) t -i, contact (p2, pi). 

4. aids{pl) <— aids (p2), contact (pi, p2). 

5. contact(pl,p2). 

6. contact (p2, pi). 

When there is no cyclic influence, i^-B is a non-temporal model, represented by I c i ause (K B) . 
When cyclic influences occur, however, becomes a temporal model, represented by 
Z c iause(KB) g . Let S(KB) g beS(KB) plus all state input variables introduced mX dause (KB) g . 

Definition 4.5 When there are cyclic influences, the probability distribution induced by 
KB is P(S(KB) g ). 

By extending the independence assumption from S(KB) to S(KB) g , we obtain the 
following result. 

Theorem 4.10 P(S(KB) g ) = YIa es(KB) P{Ai\pa{Ai)) under the independence assump- 
tion. 

Proof: Since X c i ause (KB) g produces no cyclic influences, the random variables in S(KB) g 
can be arranged in a partial order such that if Ai is influenced by Aj then j > i. The proof 
then proceeds in the same way as that of Theorem 14.91 □ 

5 Building a Bayesian Network from a Bayesian Knowl- 
edge Base 

5.1 Building a Two-Slice DBN Structure 

From a Bayesian knowledge base KB, we can derive a set of influence clauses X c i ause (K B) , 
which defines the same direct influence relation over the same space S(KB) of random 
variables as PBUCB does (see Theorem 14. 4)1 . Therefore, given a probabilistic query together 
with some evidences, we can depict a network structure from I c i ause (KB), which covers the 
random variables in the query and evidences, by backward-chaining the related random 
variables via the direct influence relation. 

Let Q be a probabilistic query and E a set of evidences, where all random variables 
come from S(KB) (i.e., they are heads of some influence clauses in X c i ause (K B)) . Let TOP 
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consist of these random variables. An influence network of Q and E, 7 denoted T net (KB)Q^E, 
is constructed from T c \ ause (KB) using the following algorithm. 

Algorithm 3: Building an influence network. 

1. Initially, T net (KB)qE has all random variables in TOP as nodes. 

2. Remove the first random variable A from TOP. For each influence clause in I c i ause (KB) 

k 

of the form k. A <— A\, Ai, if / = then add to T net (KB)qE an edge A <— . Other- 
wise, for each Aj in the body 

(a) If Aj is not in T net (KB)qE then add Aj to T net (KB)qE as a new node and add 
it to the end of TOP. 

k 

(b) Add to l net (KB)q t E an edge A <— Aj. 

3. Repeat step 2 until TOP becomes empty. 

4. Return T net (KB) Q:E . 

Example 5 A (Example 14.21 continued! To build an influence network from KB\ that 
covers aids(pl), aids(p2) and aids(p3), we apply Algorithm 3 to T dause ( KBi) while letting 
TOP = {aids (pi), aids (jo2), aids (pS)}. It generates an influence network T nf , t (KB\)Q^E as 
shown in Figure H3 



An influence network is a graphical representation for influence clauses. This claim is 
supported by the following properties of influence networks. 

Theorem 5.1 For any Ai,Aj in T net (KB)Q t E, Aj is a parent node of Ai, connected via an 

k 

edge Aj <— Aj, if and only if there is an influence clause of the form k. Aj <— Ai, Aj, A\ 
in I dause (KB). 

7 Note the differences between influence networks and influence diagrams. Influence diagrams (also known 
as decision networks) are a formalism introduced in decision theory that extends Bayesian networks by 
incorporating actions and utilities |24j . 





Figure 6: An influence network built from the AIDS program KB\. 
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Proof: First note that termination of Algorithm 3 is guaranteed by the fact that any random 
variable in S(KB) will be added to TOP no more than one time (line 12 aj) . Let Ai,Aj be 
nodes in X net (KB)q e- If Aj is a parent node of A i: connected via an edge A, <— Aj, this 
edge must be added at line 120 due to applying an influence clause in X dause (KB) of the 
form k. Ai <— A±, Aj, A\ (line 121). Conversely, if X c \ ause (KB) contains such an influence 
clause, it must be applied at line EJ with edges of the form Ai <— Aj added to the network 
at lineEB □ 

Theorem 5.2 For any Ai,Aj in X net (KB)Q^E, Ai is a descendant node of Aj if and only if 
Ai is influenced by Aj. 

Proof: Assume Ai is a descendant node of Aj, with a path 

A .jL Bl h ...B m k ^Aj (9) 

By Theorem 15. 1[ X c i ause (KB) must contain the following influence clauses 

k. Ai <— JBi, ... 
h. B,^ ...,B 2 ,... 

(10) 

h R <— A ■ 
""m m ■ LJ m -"-j i ••• 

By Theorem 14.41 and Definition 14. 2\ Ai is influenced by Aj. Conversely, if Ai is influenced 
by Aj, there must be a chain of influence clauses of the form as above. Since Ai, Aj are in 
X net (KB)Q E, by Theorem 15. II there must be a path of form Q in the network. □ 

Theorem 5.3 Let V be the set of nodes in X net (KB)Q E and let W = {Aj E S(KB)\ for 
some A, e TOP, A { is influenced by Aj}. V = TOP UW. 8 

Proof: That X net {KB)Q.E covers all random variables in TOP follows from line Q of Algo- 
rithm 3. We first prove that if Aj e W then Aj G V. Assume Aj e W. There must be a 
chain of influence clauses of form (|10|) with Ai e TOP. In this case, B\, B 2 , B m , Aj will 
be recursively added to the network (lineEJ). Thus Aj G V. We then prove that if Aj G V 
and Aj g TOP then Aj G W. Assume Aj G V and Aj £ TOP. Aj must not be added to V 
at line ^ Instead, it is added to V at line This means that for some Ai G TOP, Ai is a 
descendant of Aj. By Theorem 15 .2\ Ai is influenced by Aj. Hence Aj G W. □ 

Theorem 14.91 shows that the probability distribution induced by KB can be com- 
puted over X dause (KB). Let X net (KB)s(KB) denote an influence network that covers all 
random variables in S(KB). We show that the same distribution can be computed over 

8 This result suggests that an influence network is similar to a supporting network introduced in |2U| . 
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T net (KB)s(KB)- For any node Ai in X net (KB)s(KB), let parents(Ai) denote the set of 
parent nodes of Ai in the network. Observe the following facts: First, by Theorem 15.11 
parents(Ai) = pa(Ai). Second, by Theorem l5.2| A» is a descendant node of Aj in T net {KB)s(KB) 
if and only if Ai is influenced by Aj in X c i ause (KB). This means that the independence as- 
sumption (Assumption |TJ) applies to l net (KB) S ( KB - ) as well, and that X dause (KB) produces 
a cycle of direct influences if and only if T net (KB) S ( KB ) contains the same (direct) loop. 
Combining these facts leads to the following immediate result. 

Theorem 5.4 When no cyclic influence occurs, the probability distribution induced by KB 
can be computed over T net (KB)s(KB)- That is, P(S(KB)) = IL^ eS(iCB) P(^«b a (^)) = 
Y\.AiGS(KB) P(Ai\parents(Ai)) under the independence assumption. 

Theorem 15.41 implies that an influence network without loops is a Bayesian network 
structure. Let us consider influence networks with loops. By Theorem 15 .2\ loops in an 
influence network are generated from recursive influence clauses of form (JHJ) and thus they 
depict feedback connections of form (JJJ). This means that an influence network with loops 
can be converted into a two-slice DBN, simply by converting each loop of the form 



Ai i_ Ai^_ 5pz?-A n 

into a two-slice DBN path 

A 1 hA 2 ^... k ^ 1 A n hA lt ^ 

by introducing a state input node Ai t _ l . 

As illustrated in Section a two-slice DBN is a snapshot of a stationary DBN across any 
two time slices, which can be obtained by traversing the stationary DBN from a set of state 
variables backward to the same set of state variables (i.e., state input nodes). This process 
corresponds to generating an influence network T net (KB)q E from T c i ause (KB) incrementally 
(adding nodes and edges one at a time) while wrapping up loop nodes with state input nodes. 
This leads to the following algorithm for building a two-slice DBN structure, 2S ne t(KB)Q } E, 
directly from X c i ause (KB), where Q, E and TOP are the same as defined in Algorithm 3. 

Algorithm 4: Building a two-slice DBN structure. 

1. Initially, 2S ne t(KB)QE has all random variables in TOP as nodes. 

2. Remove the first random variable A from TOP. For each influence clause in T c i ause (KB) 

k 

of the form k. A <— Ax, Ai, if / = then add to 2S ne t(KB)Q t E an edge A <—. Oth- 
erwise, for each Ai in the body 

(a) If Ai is not in 2S net (KB)Q ) E then add Ai to 2S net (KB)Q t E as a new node and add 
it to the end of TOP. 
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(b) If adding A <— Ai to 2S net (KB)Q tE produces a loop, then add to 2S net (KB)q E a 

k k 

node Aj t _j and an edge A <— A it _ 1 , else add an edge A <— Ai to 2S net {KB)Q )E . 

3. Repeat step 2 until TOP becomes empty. 

4. Return 2S net (KB) QtE . 

Example 5.2 (Example 15.11 continued) To build a two-slice DBN structure from KBi 
that covers aids(pl), aids(p2) and aids(p3), we apply Algorithm 4 to T c i ause { KBi) while let- 
ting TOP = {aids(pl),aids(p2),aids(p3)}. It generates 2S net (KBi)Q^ E as shown in Figure 
13 Note that loops are cut by introducing three state input nodes aids(pl)t-\, aids(p2) t -\ 
and aids(p3)t-i- The two-slice DBN structure concisely depicts a feedback system where the 
feedback connections are as shown in Figure |H1 

I- 

aids(p2) t _ i contact{p2, pi) 

aids (pi) t_ i ^.aids(p2) contact(pl, p2) 

aids(pl) _^ 



a ids(p3)t_i 




aids(p3) 



Figure 7: A two-slice DBN structure built from the AIDS program KB\. 
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Figure 8: The feedback connections created by the AIDS program KB\. 



Algorithm 4 is Algorithm 3 enhanced with a mechanism for cutting loops fiteml2b"|). i.e. 

k 

when adding the current edge A <— Ai to the network forms a loop, we replace it with an edge 

k 

A ^— Ai t _ 1 , where Ai t _ l is a state input node. This is a process of transforming influence 
clauses © to (jSJ). Therefore, 2S net {KB)q tE can be viewed as an influence network built 
from a generalized set T dause (KB) g of influence clauses. Let S(KB) g be the set of random 
variables in T dause (KB) g , as defined in Theorem 14.101 Let 2S net (KB) S ( KB ) denote a two- 
slice DBN structure (produced by applying Algorithm 4) that covers all random variables in 
S(KB) g . We then have the following immediate result from Theorem 15.41 

Theorem 5.5 When T c i ause (K B) produces cyclic influences, the probability distribution in- 
duced by KB can be computed over2S net (KB) s ^ KB y That is, P(S(KB) g ) = YlAieS(KB), 
\pa(Ai)) = Y\a £S(kb) P{Ai\parents(Ai)) under the independence assumption. 



19 



Remark 5.1 Note that Algorithm 4 produces a DBN structure without using any explicit 
time parameters. It only requires the user to specify, via the query and evidences, what 
random variables are necessarily included in the network. Algorithm 4 builds a two-slice 
DBN structure for any given query and evidences whose random variables are heads of some 
influence clauses in I dause (K B) . When no query and evidences are provided, we may apply 
Algorithm 4 to build a complete two-slice DBN structure, 2S net (KB)s(KB), which covers the 
space S(KB) of random variables, by letting TOP consist of all heads of influence clauses 
in I c i ause (KB). This is a very useful feature, as in many situations the user may not be able 
to present the right queries unless a Bayesian network structure is shown. 

Also note that when there is no cyclic influence, Algorithm 4 becomes Algorithm 3 and 
thus it builds a regular Bayesian network structure. 

5.2 Building CPTs 

After a Bayesian network structure 2S net (KB)Q jE has been constructed from a Bayesian 
knowledge base KB, we associate each (non-state-input) node A in the network with a 
CPT. There are three cases. (1) If A (as a head) only has unit clauses in X c i ause (KB), we 
build from the unit clauses a prior CPT for A as its prior probability distribution. (2) If A 
only has non-unit clauses in I c i ause (KB), we build from the clauses a posterior CPT for A 
as its posterior probability distribution. (3) Otherwise, we prepare for A both a prior CPT 
(from the unit clauses) and a posterior CPT (from the non-unit clauses). In this case, A is 
attached with the posterior CPT; the prior CPT for A would be used, if A is a state variable, 
as the probability distribution of A in time slice (only in the case that a two-slice DBN is 
unrolled into a stationary DBN starting with time slice 0). 

Assume that the parent nodes of A are derived from n {n > 1) different influence clauses 
in T dause (KB). Suppose these clauses share the following CPTs in T x : P(Ai\Bl, B^J, 
and "P(A n \B™, B^ n ). (Recall that an influence clause prefixed with a number k shares the 
CPT attached to the k-th clause in PB.) Then the CPT for A is computed by combining 
the n CPTs in terms of the combination rule CR specified in Definition 13.11 

Example 5.3 (Example 15.21 continued) Let CPTj denote the CPT attached to the i-th 
clause in PB\. Consider the random variables in 2S ne t{KBi)Q t E- Since aids(pl) has three 
parent nodes, derived from the 3rd and 4-th clause in PB\ respectively, the posterior CPT for 
aids{pl) is computed by combining CPT 3 and CPT 4 . aids(pl) has also a prior CPT, CPT l5 
derived from the 1st clause in PB\. For the same reason, the posterior CPT for aids(p2) 
is computed by combining CPT 3 and CPT 4 . The posterior CPT for aids(p3) is CPT 3 and 
its prior CPT is CPT2. contact(pl,p2) and contact(p2,pl) have only prior CPTs, namely 
CPT 5 and CPT 6 . Note that state input nodes, aids(pl) t -i, aids{p2) t -\ and aids(p3) t ~i, 
do not need to have a CPT; they will be expanded, during the process of unrolling the 
two-slice DBN into a stationary DBN, to cover the time slices involved in the given query 
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and evidence nodes. If the resulting stationary DBN starts with time slice 0, the prior 
CPTs, CPT aids ( pl ) and CPT aids ( p3 ) , for aids(pl) and aids(p3) are used as the probability 
distributions of aids(pl)o and aids(p3)o- 

Note that aids(p2) is a state variable, but there is no unit influence clause available 
to build a prior CPT for it. We have two ways to derive a prior CPT, CPT aids ^ , for 
aids{p2) from some existing CPTs. (1) CPT ai(is (p2) comes from averaging CPT ai d s (pi) and 
CPT ai ds(p3) ■ For instance, let the probability of aids{pl) = yes be 0.7 in CPT aids ( pl ) and the 
probability of aids{p3) = yes be 0.74 in CPT ai ds( P 3) ■ Then the probability of aids{p2) = yes 
is (0.7 + 0.74)/2 = 0.72 in CPT a id s ( P 2) ■ (2) CPT a ids( P 2) comes from averaging the posterior 
probability distributions of aids(p2). For instance, let {0.9,0.7,0.4,0.8} be the posterior 
probabilities of aids(p2) = yes in the posterior CPT for aids(p2). Then the probability of 
aids(p2) = yes is (0.9 + 0.7 + 0.4 + 0.8)/4 = 0.7 in CPT aids(p2)o . 

6 Related Work 

A recent overview of existing representational frameworks that combine probabilistic rea- 
soning with logic (i.e. logic-based approaches) or with relational representations (i.e. non- 
logic-based approaches) is given by De Raedt and Kersting jH]. Typical non- logic-based 
approaches include probabilistic relational models (PRM), which are based on the entity- 
relationship (or object-oriented) model [T2~ ] I15 | 12*2*]. and relational Markov networks, which 
combine Markov networks and SQL-like queries [SD]- Representative logic-based approaches 
include frameworks based on the KBMC (Knowledge-Based Model Construction) idea (HI 
HI Uni E3 EH EH 1201 1231) stochastic logic programs (SLP) based on stochastic context-free 
grammars parameterized logic programs based on distribution semantics (PRISM) 

|2fij . and more. Most recently, a unifying framework, called Markov logic, has been proposed 
by Domingos and Richardson jH]. Markov logic subsumes first-order logic and Markov net- 
works. Since our work follows the KBMC idea focusing on how to build a Bayesian network 
directly from a logic program, it is closely related to three representative existing PLP ap- 
proaches: the context-sensitive PLP developed by Haddawy and Ngo [20] , Bayesian logic 
programming proposed by Kersting and Raedt [T7] , and the time parameter-based approach 
presented by Glesner and Koller ^3] • In this section, we make a detailed comparison of our 
work with the three closely related approaches. 

6.1 Comparison with the Context-Sensitive PLP Approach 

The core of the context-sensitive PLP is a probabilistic knowledge base (PKB). In order 
to see the main differences from our Bayesian knowledge base (BKB), we reformulate its 
definition here. 

Definition 6.1 A probabilistic knowledge base is a four tuple <PD, PB,CB,CR>, where 
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• PD defines a set of probabilistic predicates (p-predicates) of the form p(Ti, T m ,V) 
where all arguments T^s are typed with a finite domain and the last argument V takes 
on values from a probabilistic domain DOM p . 

• PB consists of probabilistic rules of the form 

P(A \A U ...,Ai) = a<- B u B m , -,C X , -C n (11) 

where < a < 1, the AiS are p-predicates, and the Bjs and C^s are context predicates 
(c-predicates) defined in CB. 

• CB is a logic program, and both PB and CB are acyclic. 

• CR is a combination rule. 

In a probabilistic rule ([TT|) . each p-predicate Aj is of the form q(t\, ...,t m ,v), which 
simulates an equation q(ti, ■■■,t m ) = v with v being a value from the probabilistic domain 
of q(tx, t m ). For instance, let -D co / or = {red, green, blue} be the probabilistic domain of 
color(X), then the p-predicate color(X,red) simulates color(X) = red, meaning that the 
color of X is red. The left-hand side P{Aq\A\, A{) = a expresses that the probability of 
Aq conditioned on A\, Ai is a. The right-hand side B\, B m , ->Ci, -<C n is the context 
of the rule where the Bjs and C^s are c-predicates. Note that the sets of p-predicate and c- 
predicate symbols are disjoint. A separate logic program CB is used to evaluate the context 
of a probabilistic rule. As a whole, the above probabilistic rule states that for each of its 
(Herbrand) ground instances 

P(A' \A' V A\) = a <- B[, B' m , -<? n 

if the context B[, B' m , ->C[, ->C' n is true in CB under the program completion semantics, 
the probability of A' conditioned on A\_, A\ is a. 

PKB and BKB have the following important differences. 

First, probabilistic rules of form ()11|) in PKB contain both logic representation (right- 
hand side) and probabilistic representation (left-hand side) and thus are not logic clauses. 
The logic part and the probabilistic part of a rule are separately computed against CB and 
PB, respectively. In contrast, BKB uses logic clauses of form (j3J), which naturally integrate 
the direct influence information, the context and the type constraints. These logic clauses 
are evaluated against a single logic program PB U CB, while the probabilistic information 
is collected separately in T x . 

Second, logic reasoning in PKB relies on the program completion semantics and is car- 
ried out by applying SLDNF-resolution. But in BKB, logic inferences are based on the 
well-founded semantics and are performed by applying SLG-resolution. The well-founded 
semantics resolves the problem of inconsistency with the program completion semantics, 
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while SLG-resolution eliminates the problem of infinite loops with SLDNF-resolution. Note 
that the key significance of BKB using the well-founded semantics lies in the fact that a 
unique set of influence clauses can be derived, which lays a basis on which both the declar- 
ative and procedural semantics for BKB are developed. 

Third, most importantly PKB has no mechanism for handling cyclic influences. In PKB, 
cyclic influences are defined to be inconsistent (see Definition 9 of the paper [20]) and thus 
are excluded (PKB excludes cyclic influences by requiring its programs be acyclic). In BKB, 
however, cyclic influences are interpreted as feedbacks, thus implying a time sequence. This 
allows us to derive a stationary DBN from a logic program with recursive loops. 

Recently, Fierens, Blockeel, Ramon and Bruynooghe introduced logical Bayesian 
networks (LBN). LBN is similar to PKB except that it separates logical and probabilistic 
information. That is, LBN converts rules of form into the form 

A \ A 1 , A[ <— Bi, B m , -1C1, -i(7 n 

where the AiS are p-predicates with the last argument V removed, and the BjS and C&s 
are c-predicates defined in CB. This is not a standard clause of form Q as defined in 
logic programming [T^]. Like PKB, LBN differs from BKB in the following: (1) it has 
no mechanism for handling cyclic influences (see Section 3.2 of the paper [H]), and (2) 
although the well-founded semantics is also used for the logic contexts, neither declarative 
nor procedural semantics for LBN has been formally developed. 

6.2 Comparison with Bayesian Logic Programming 

Building on Ngo and Haddawy's work, Kersting and De Raedt introduce the framework 
of Bayesian logic programs. A Bayesian logic program (BLP) is a triple <P, T x , CR> where 
P is a well-defined logic program, T x consists of CPTs associated with each clause in P, 
and CR is a combination rule. A distinct feature of BLP over PKB is its separation of 
probabilistic information (T x ) from logic clauses (P). According to ^7], we understand 
that a well-defined logic program is an acyclic positive logic program satisfying the range 
restriction. 9 For instance, a logic program containing clauses like r(X) <— r(X) (cyclic) or 
r(X) <— s(Y) (not range-restricted) is not well-defined. BLP relies on the least Herbrand 
model semantics and applies SLD-resolution to make backward-chaining inferences. 

BLP has two important differences from BKB. First, it applies only to positive logic 
programs. Due to this, it cannot handle contexts with negated atoms. (In fact, no contexts 
are considered in BLP.) Second, it does not allow cyclic influences. BKB can be viewed as 
an extension of BLP with mechanisms for handling contexts and cyclic influences in terms 
of the well-founded semantics. Such an extension is clearly nontrivial. 

9 A logic program is said to be range-restricted if all variables appearing in the head of a clause appear in 
the body of the clause. 
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6.3 Comparison with the Time Parameter-Based Approach 

The time parameter-based framework (TPF) proposed by Glesner and Koller ^3] is also a 
triple <P, T x , CR>, where CR is a combination rule, T x is a set of CPTs that are represented 
as decision trees, and P is a logic program with the property that each predicate contains a 
time parameter and that in each clause the time argument in the head is at least one time step 
later than the time arguments in the body. This framework is implemented in Prolog, i.e. 
clauses are represented as Prolog rules and goals are evaluated applying SLDNF-resolution. 
Glesner and Koller ^3] state: "... In principle, this free variable Y can be instantiated with 
every domain element. (This is the approach taken in our implementation.)" By this we 
understand that they consider typed logic programs with finite domains. 

We observe the following major differences between TPF and BKB. First, TPF is a 
temporal model and its logic programs contain a time argument for every predicate. It 
always builds a DBN from a logic program even if there is no cyclic influence. In contrast, 
logic programs in BKB contain no time parameters. When there is no cyclic influence, 
BKB builds a regular Bayesian network from a logic program (in this case, BKB serves as a 
non-temporal model); when cyclic influences occur, it builds a stationary DBN, represented 
by a two-slice DBN (in this case, BKB serves as a special temporal model). Second, TPF 
uses time steps to describe direct influences (in the way that for any A and B such that B 
has a direct influence on A, the time argument in B is at least one time step earlier than 
that in A), while BKB uses time slices (implied by recursive loops of form $IJ)) to model 
cycles of direct influences (feedbacks). Time-steps based frameworks like TPF are suitable 
to model flexible DBNs, whereas time-slices based approaches like BKB apply to stationary 
DBNs. Third, most importantly TPF avoids recursive loops by introducing time parameters 
to enforce acyclicity of a logic program. A serious problem with this method is that it 
may lose and/or produce wrong answers to some queries. To explain this, let P be a logic 
program and Pt be P with additional time arguments added to each predicate (as in TPF). 
If the transformation from P to Pt is correct, it must hold that for any query p(.) over P, 
an appropriate time argument N = 0,1, 2, ... can be determined such that the query p(., N) 
over P t has the same set of answers as p(.) over P when the time arguments in the answers 
are ignored. It turns out, however, that this condition does not hold in general cases. Note 
that finding an appropriate N for a query p(.) such that evaluating p(., N) over P t (applying 
SLDNF-resolution) yields the same set of answers as evaluating p(.) over P corresponds 
to finding an appropriate depth-bound M such that cutting all SLDNF-derivations for the 
query p(.) at depth M does not lose any answers to p(.). The latter is the well-known loop 
problem in logic programming Since the loop problem is undecidable in general, there is 
no algorithm for automatically determining such a depth-bound M (rep. a time argument 
N) for an arbitrary query p(.) [21123 EH]- We further illustrate this claim using the following 
example. 
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Example 6.1 The following logic program defines a path relation; i.e. there is a path from 
X to Y if either there is an edge from X to Y or for some Z, there is a path from X to Z 
and an edge from Z to Y. 

P : 1. e(s,bl). 
2. e(61,62). 



99. e(698,&99). 

100. e(699,£). 

101. path(X,Y) <- e(X,y). 

102. pat/i(X, y) <- pat/i(X, Z), e(Z, y). 

To avoid recursive loops, TPF may transform P into the following program. 

P t : 1. e(s,W,0). 
2. e(61,62,0). 

99. e(&98, 699,0). 

100. e(699,#,0). 

101. e(X, y, Tl) <- T2 = Tl - 1, e(X, y, T2). 

102. path(X, y, Tl) <- T2 = Tl - 1, e(X, Y, T2). 

103. pat/i(X, y, Tl) <- T2 = Tl - l,path{X, Z, T2), e(Z, y, T2). 

Pt looks more complicated than P. In addition to having time arguments and time 
formulas, it has a new clause, the 101st clause, formulating that e(X, Y) being true at 
present implies it is true in the future. 

Let us see how to check if there is a path from s to g. In the original program P, 
we simply pose a query ? — path(s,g). In the transformed program P t , however, we have 
to determine a specific time parameter N and then pose a query ? — path(s, g, N), such 
that evaluating path(s, g) over P yields the same answer as evaluating path(s, g, N) over P t . 
Interested readers can practice this query evaluation using different values for N. The answer 
to path(s,g) over P is yes. However, we would get an answer no to the query path(s, g, N) 
over Pf if we choose any N < 100. 

7 Conclusions and Discussion 

We have developed a novel theoretical framework for deriving a stationary DBN from a logic 
program with recursive loops. We observed that recursive loops in a logic program imply 
a time sequence and thus can be used to model a stationary DBN without using explicit 
time parameters. We introduced a Bayesian knowledge base with logic clauses of form Pjl. 
These logic clauses naturally integrate the direct influence information, the context and the 
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type constraints, and are evaluated under the well-founded semantics. We established a 
declarative semantics for a Bayesian knowledge base and developed algorithms that build a 
two-slice DBN from a Bayesian knowledge base. 
We emphasize the following three points. 

1. Recursive loops (cyclic influences) and recursion through negation are unavoidable in 
modeling real-world domains, thus the well-founded semantics together with its top- 
down inference procedures is well suitable for the PLP application. 

2. Recursive loops define feedbacks, thus implying a time sequence. This allows us to 
derive a two-slice DBN from a logic program containing no time parameters. We point 
out, however, that the user is never required to provide any time parameters during 
the process of constructing such a two-slice DBN. A Bayesian knowledge base defines 
a unique space of random variables and a unique set of influence clauses, whether it 
contains recursive loops or not. From the viewpoint of logic, these random variables 
are ground atoms in the Herbrand base; their truth values are determined by the well- 
founded model and will never change over time. 10 Therefore, a Bayesian network is 
built over these random variables, independently of any time factors (if any). Once a 
two-slice DBN has been built, the time intervals over it would become clearly specified, 
thus the user can present queries and evidences over the DBN using time parameters 
at his/her convenience. 

3. Enforcing acyclicity of a logic program by introducing time parameters is not an ef- 
fective way to handle recursive loops. Firstly, such a method transforms the original 
non-temporal logic program into a more complicated temporal program and builds a 
dynamic Bayesian network from the transformed program even if there exist no cyclic 
influences (in this case, there is no state variable and the original program defines 
a regular Bayesian network). Secondly, it relies on time steps to define (individual) 
direct influences, but recursive loops need time slices (intervals) to model cycles of 
direct influences (feedbacks). Finally, to pose a query over the transformed program, 
an appropriate time parameter must be specified. As illustrated in Example 16.14 there 
is no algorithm for automatically determining such a time parameter for an arbitrary 
query. 

Promising future work includes (1) developing algorithms for learning BKB clauses to- 
gether with their CPTs from data and (2) applying BKB to model large real- world problems. 
We intend to build a large Bayesian knowledge base for traditional Chinese medicine, where 
we already have both a large volume of collected diagnostic rules and a massive repository 
of diagnostic cases. 

10 However, from the viewpoint of Bayesian networks the probabilistic values of these random variables 
(i.e. values from their probabilistic domains) may change over time. 
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